Compressed timing indicators for media samples

ABSTRACT

A first frame of data is encoded and a first timestamp associated with the first frame of data is generated. The first timestamp includes complete timing information. The first frame of data and the associated first timestamp is transmitted to a destination. A second frame of data is encoded and a second timestamp associated with the second frame of data is generated. The second timestamp includes a portion of the complete timing information. The second frame of data and the associated second timestamp is then transmitted to the destination. Additional frames of data are encoded and additional timestamps associated with the additional frames of data are generated. The majority of the additional timestamps include a portion of the complete timing information.

RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/241,407 filed Oct. 18, 2000, the disclosure of whichis incorporated by reference herein.

TECHNICAL FIELD

[0002] The present invention relates to video processing systems and,more particularly, to the compression of timing indicators associatedwith media samples.

BACKGROUND

[0003] The concept of recording and using timing information isfundamental to the needs of multimedia applications. Pictures, video,text, graphics, and sound need to be recorded with some understanding ofthe time associated with each sample of the media stream. This is usefulfor synchronizing different multimedia streams with each other, forcarrying information to preserve the original timing of the media whenplaying a media stream, for identifying specific locations within amedia stream, and for recording the time associated with the mediasamples to create a scientific or historical record. For example, ifaudio and video are recorded together but handled as separate streams ofmedia data, then timing information is necessary to coordinate thesynchronization of these two (or more) streams.

[0004] Typically, a media stream (such as a recorded audio track orrecorded video or film shot) is represented as a sequence of mediasamples, each of which is associated (implicitly or explicitly) withtiming information. A good example of this is video and motion picturefilm recording, which is typically created as a sequence of pictures, orframes, each of which represents the camera view for a particular shortinterval of time (e.g., typically {fraction (1/24)} seconds for eachframe of motion picture film). When this sequence of pictures is playedback at the same number of frames per second (known as the “frame rate”)as used in the recording process, an illusion of natural movement of theobjects depicted in the scene can be created for the viewer.

[0005] Similarly, sound is often recorded by regularly sampling an audiowaveform to create a sequence of digital samples (for example, using48,000 samples per second) and grouping sets of these samples intoprocessing units called frames (e.g., 64 samples per frame) for furtherprocessing such as digital compression encoding or packet-networktransmission (such as Internet transmission). A receiver of the audiodata will then reassemble the frames of audio that it has received,decode them, and convert the resulting sequence of digital samples backinto sound using electro-acoustic technology.

[0006]FIG. 1 illustrates a conventional system 100 for processing anddistributing video content. The video content is captured using a videocamera 102 (or any other video capture device) that transfers thecaptured video content onto video tape or another storage medium. Later,the captured video content may be edited using a video editor 104. Avideo encoder 106 encodes the video content to reduce the storage spacerequired for the video content or to reduce the transmission bandwidthrequired to transmit the video content. Various encoding techniques maybe used to compress the video content, such as the MPEG-2 (MovingPicture Experts Group 2nd generation) compression format.

[0007] The encoded video content is provided to a transmitter 108, whichtransmits the encoded video content to one or more receivers 110 acrossa communication link 112. Communication link 112 may be, for example, aphysical cable, a satellite link, a terrestrial broadcast, an Internetconnection, a physical medium (such as a digital versatile disc (DVD))or a combination thereof. A video decoder 114 decodes the signalreceived by receiver 110 using an appropriate decoding technique. Thedecoded video content is then displayed on a video display 116, such asa television or a computer monitor. Receiver 110 may be a separatecomponent (such as a set top box) or may be integrated into videodisplay 116. Similarly, video decoder 114 may be a separate component ormay be integrated into the receiver 110 or the video display 116.

[0008] Proper recording and control of timing information is needed tocoordinate multiple streams of media samples, such as for synchronizingvideo and associated audio content. Even the use of media which does notexhibit a natural progression of samples through time will often requirethe use of timing information in a multimedia system. For example, if astationary picture (such as a photograph, painting, or document) is tobe displayed along with some audio (such as an explanatory descriptionof the content or history of the picture), then the timing of thedisplay of the stationary picture (an entity which consists of only oneframe or sample in time) may need to be coordinated with the timing ofthe associated audio track.

[0009] Other examples of the usefulness of such timing informationinclude being able to record the date or time of day at which aphotograph was taken, or being able to specify editing or viewing pointswithin media streams (e.g., five minutes after the camera startedrolling).

[0010] In each of the above cases, a sample or group of samples in timeof a media stream can be identified as a frame, or fundamentalprocessing unit. If a frame consists of more than one sample in time,then a convention can be established in which the timing informationrepresented for a frame corresponds to the time of some reference pointin the frame such as the time of the first, last or middle sample.

[0011] In some cases, a frame can be further subdivided into evensmaller processing units, which can be called fields. One example ofthis is in the use of interlaced-scan video, in which the sampling ofalternating lines in a picture are separated so that half of the linesof each picture are sampled as one field at one instant in time, and theother half of the lines of the picture are then sampled as a secondfield a short time later. For example, lines 1, 3, 5, etc. may besampled as one field of picture, and then lines 0, 2, 4, etc. of thepicture may be sampled as the second field a short time later (forexample {fraction (1/50)}th of a second later). In such interlaced-scanvideo, each frame can be typically separated into two fields.

[0012] Similarly, one could view a grouping of 64 samples of an audiowaveform for purposes of data compression or packet-network transmissionto be a frame, and each group of eight samples within that frame to be afield. In this example, there would be eight fields in each frame, eachcontaining eight samples.

[0013] In some methods of using sampled media streams that are wellknown in the art, frames or fields may consist of overlapping sets ofsamples or transformations of overlapping sets of samples. Two examplesof this behavior are the use of lapped orthogonal transforms [1)Henrique Sarmento Malvar, Signal Processing with Lapped Transforms,Boston, Mass., Artech House, 1992; 2) H. S. Malvar and D. H. Staelin,“The LOT: transform coding without blocking effects,” IEEE Transactionson Acoustics, Speech, and Signal Processing, vol. 37, pp. 553-559, Apr.1989; 3) H. S. Malvar, Method and system for adapting a digitized signalprocessing system for block processing with minimal blocking artifacts,U.S. Pat. No. 4,754,492, June 1988.] and audio redundancy coding [1) J.C. Bolot, H. Crepin, A. Vega-Garcia: “Analysis of Audio Packet Loss inthe Internet”, Proceedings of the 5th International Workshop on Networkand Operating System Support for Digital Audio and Video, pp. 163-174,Durham, April 1995; 2) C. Perkins, I. Kouvelas, O. Hodson, V. Hardman,M. Handley, J. C. Bolot, A. Vega-Garcia, S. Fosse-Parisis: “RTP Payloadfor Redundant Audio Data”, Internet Engineering Task Force Request forComments RFC2198, 1997.]. Even in such cases it is still possible toestablish a convention by which a time is associated with a frame orfield of samples.

[0014] In some cases, the sampling pattern will be very regular in time,such as in typical audio processing in which all samples are created atrigidly-stepped times controlled by a precise clock signal. In othercases, however, the time between adjacent samples in a sequence maydiffer from location to location in the sequence.

[0015] One example of such behavior is when sending audio over a packetnetwork with packet losses, which may result in some frames not beingreceived by the decoder while other frames should be played for use withtheir original relative timing. Another example of such behavior is inlow-bit-rate videoconferencing, in which the number of frames sent persecond is often varied depending on the amount of motion in the scene(since small changes take less data to send than large changes, and theoverall channel data rate in bits per second is normally fixed).

[0016] If the underlying sampling structure is such that there isunderstood to be a basic frame or field processing unit sampling rate(although some processing units may be skipped), then it is useful to beable to identify a processing unit as a distinct counting unit in thetime representation. If this is incorporated into the design, theoccurrence of a skipped processing unit may be recognized by a missingvalue of the counting unit (e.g., if the processing unit count proceedsas 1, 2, 3, 4, 6, 7, 8, 9, . . . , then it is apparent that count number5 is missing).

[0017] If the underlying sampling structure is such that the sampling isso irregular that there is no basic processing unit sampling rate, thenwhat is needed is simply a good representation of true time for eachprocessing unit. Normally however, in such a case there should at leastbe a common time clock against which the location of the processing unitcan be referenced.

[0018] In either case (with regular or irregular sampling times), it isuseful for a multimedia system to record and use timing information forthe samples or frames or fields of each processing unit of the mediacontent.

[0019] Different types of media may require different sampling rates. Iftiming information is always stored with the same precision, a certainamount of rounding error may be introduced by the method used forrepresenting time. It is desirable for the recorded time associated witheach sample to be represented precisely in the system with little or nosuch rounding error. For example, if a media stream operates at30,000/1001 frames per second (the typical frame rate of North Americanstandard NTSC broadcast video-approximately 29.97 frames per second) andthe precision of the time values used in the system is to one part in10⁻⁶ seconds, then although the time values may be very precise in humanterms, it may appear to processing elements within the system that theprecisely-regular sample timing (e.g. 1001/30,000 seconds per sample) isnot precisely regular (e.g. 33,366 clock increment counts betweensamples, followed by 33,367 increments, then 33,367 increments, and then33,366 increments again). This can cause difficulties in determining howto properly handle the media samples in the system.

[0020] Another problem in finding a method to represent time is that therepresentation may “drift” with respect to true time as would bemeasured by a perfectly ideal “wall clock”. For example, if the systemuses a precisely-regular sample timing of 1001/30,000 seconds per sampleand all samples are represented with incremental time intervals being33,367 increments between samples, the overall time used for a longsequence of such samples will be somewhat longer than the true timeinterval—a total of about one frame time per day and accumulating morethan five minutes of error after a year of duration.

[0021] Thus, “drift” is defined as any error in a timecoderepresentation of sampling times that would (if uncorrected) tend toincrease in magnitude as the sequence of samples progresses.

[0022] One example of a method of representing timing information isfound in the SMPTE 12M design [Society of Motion Picture and TelevisionEngineers, Recommended Practice 12M: 1999] (hereinafter called “SMPTEtimecode”). SMPTE timecodes are typically used for television video datawith timing specified in the United States by the National TelevisionStandards Committee (NTSC) television transmission format, or in Europe,by the Phase Alternating Line (PAL) television transmission format.

[0023] SMPTE timecode is a synchronization signaling method originallydeveloped for use in the television and motion picture industry to dealwith video tape technology. The challenge originally faced withvideotape was that there was no “frame accurate” way to synchronizedevices for video or sound-track editing. A number of methods wereemployed in the early days, but because of the inherent slippage andstretching properties of tape, frame accurate synchronization met withlimited success. The introduction of SMPTE timecode provided this frameaccuracy and incorporated additional functionality. Additional sourceson SMPTE include “The Time Code Handbook” by Cipher Digital Inc. whichprovides a complete treatment of the subject, as well as an appendixcontaining ANSI Standard SMPTE 12M-1986. Additionally, a text entitled“The Sound Reinforcement Handbook” by Gary Davis and Ralph Jones forYamaha contains a section on timecode theory and applications.

[0024] The chief purpose of SMPTE timecode is to synchronize variouspieces of equipment. The timecode signal is formatted to provide asystem wide clock that is referenced by everything else. The signal isusually encoded directly with the video signal or is distributed viastandard audio equipment. Although SMPTE timecode uses many referencesfrom video terminology, it may also be used for audio-only applications.

[0025] In many applications, a timecode source provides the signal whilethe rest of the devices in the system synchronize to it and followalong. The source can be a dedicated timecode generator, or it can be(and often is) a piece of the production equipment that providestimecode in addition to its primary function. An example of this is amulti-track audio tape deck that provides timecode on one track andsound for the production on other tracks. Video tape often makes similaruse of a cue track or one of its audio sound tracks to record and playback timecode.

[0026] In other applications, namely video, the equipment uses timecodeinternally to synchronize multiple timecode sources into one. An examplewould be a video editor that synchronizes with timecode from a number ofprerecorded scenes. As each scene is combined with the others to makethe final product, their respective timecodes are synchronized with newtimecode being recorded to the final product.

[0027] SMPTE timecode provides a unique address for each frame of avideo signal. This address is an eight digit number, based on the 24hour clock and the video frame rate, representing Hours, Minutes,Seconds and Frames in the following format:

HH:MM:SS:FF

[0028] The values of these fields range from 00 to 23 for HH, 00 to 59for MM, 00 to 59 for SS, and 00 to 24 or 29 for FF (where 24 is themaximum for PAL 25 frame per second video and 29 is the maximum for NTSC30,000/1001 frame per second video). By convention, the first frame of aday is considered to be marked as 00:00:00:01 and the last is00:00:00:00 (one frame past the frame marked 23:59:59:24 for PAL and23:59:59:29 for NTSC). This format represents a nominal clock time, thenominal duration of scene or program material and makes approximate timecalculations easy and direct.

[0029] The frame is the smallest unit of measure within SMPTE timecodeand is a direct reference to the individual “picture” of film or video.The frame rate is the number of times per second that pictures aredisplayed to provide a rendition of motion. There are two standard framerates (frames/sec) that typically use SMPTE timecode: 25 frames persecond and 30,000/1001 frames per second (approximately 29.97 frames persecond). The 25 frame per second rate is based on European video, alsoknown as SMPTE EBU (PAL/SECAM color and b&w). The 30,000/1001 frame persecond rate (sometimes loosely referred to as 30 frame per second) isbased on U.S. NTSC color video broadcasting. Within the 29.97 frame persecond use, there are two methods of using SMPTE timecode that arecommonly used: “Non-Drop” and “Drop Frame”.

[0030] A frame counter advances one count for every frame of film orvideo, allowing the user to time events down to {fraction (1/25)}th, or{fraction (1001/30,000)}th of a second.

[0031] SMPTE timecode is also sometimes used for a frame rate of exactly30 frames per second. However, the user must take care to distinguishthis use from lo the slightly slower 30,000/1001 frames per second rateof U.S. NTSC color broadcast video. (The adjustment factor of 1000/1001originates from the method by which television signals were adjusted toprovide compatibility between modern color video and the previous designfor broadcast of monochrome video at 30 frames per second.)

[0032] Thus, the SMPTE timecode consists of the recording of an integernumber for each of the following parameters for a video picture: Hours,Minutes, Seconds, and Frames. Each increment of the frame counter isunderstood to represent an increment of time of 1001/30,000 seconds inthe NTSC system and {fraction (1/25)} seconds in the PAL system.

[0033] However, since the number of frames per second in the NTSC system(30,000/1001) is not an integer, there is a problem of drift between theSMPTE 12M timecode representation of time and true “wall clock” time.This drift can be greatly reduced by a special frame counting methodknown as SMPTE “drop frame” counting. Without SMPTE drop frame counting,the drift between the SMPTE timecode's values of Hours, Minutes, andSeconds and the value measured by a true “wall clock” will accumulatemore than 86 seconds of error per day. When using SMPTE drop framecounting, the drift accumulation magnitude can be reduced by about afactor of about 1,000 (although the drift is still not entirelyeliminated and the remaining drift is still more than two frame samplingperiods).

[0034] The SMPTE timecode has been widely used in the video productionindustry (for example, it is incorporated into the design of many videotape recorders). It is therefore very useful if any general mediatimecode design is maximally compatible with this SMPTE timecode. Ifsuch compatibility can be achieved, this will enable equipment designedfor the media timecode to work well with other equipment designedspecifically to use the SMPTE timecode.

[0035] Within this document, the following terminology is used. Atimecode describes the data used for representing the time associatedwith a media sample, frame, or field. It is useful to separate the dataof a timecode into two distinct types: the timebase and the timestamp.The timestamp includes the information that is used to represent thetiming for a specific processing unit (a sample, frame, or field). Thetimebase contains the information that establishes the basis of themeasurements units used in the timestamp. In other words, the timebaseis the information necessary to properly interpret the timestamps. Thetimebase for a media stream normally remains the same for the entiresequence of samples, or at least for a very large set of samples.

[0036] For example, we may interpret the SMPTE timecode as having atimebase that consists of:

[0037] Knowledge of (or an indication of) whether the system is NTSC orPAL, and

[0038] Knowledge of (or an indication of) whether or not the system usesSMPTE “drop frame” counting in order to partially compensate for drift.

[0039] Given this, the timestamps then consist of the representations ofthe parameters Hours, Minutes, Seconds, and Frames for each particularvideo frame.

[0040] Many existing systems transmit all parameters of the timestampwith each frame. Since many of the parameters (e.g., hours and minutes)do not typically change from one frame to the next, transmitting allparameters of the timestamp with each frame results in the transmissionof a significant amount of redundant data. This transmission ofredundant data results in the transmission of more data than isnecessary to communicate the current timing information.

[0041] The systems and methods described herein provide for thecommunication of timing indicators that convey timing information usinga reduced amount of data.

SUMMARY

[0042] The systems and methods described herein provide for twodifferent types of timestamps to be transmitted along with frames ofdata. A full timestamp includes complete timing information, such ashour information, minute information, second information, and a framenumber, A compressed timestamp includes a portion of the complete timinginformation, such as the frame number. When a receiving device receivesa compressed timestamp, the receiving device maintains the previousvalues of the timing parameters that are not contained in the compressedtimestamp. Since the most of the information in a full timestamp isredundant from one frame to the next, sending a significant number ofcompressed timestamps between full timestamps reduces the amount of datathat is transmitted, but does not result in a loss of timinginformation.

[0043] In one embodiment, a first frame of data is encoded. A firsttimestamp is generated and associated with the first frame of data. Thefirst timestamp includes complete timing information. The first frame ofdata and the associated first timestamp is then transmitted to adestination. A second frame of data is encoded and a second timestampassociated with the second frame of data is generated. The secondtimestamp includes a portion of the complete timing information. Thesecond frame of data and the associated second timestamp is transmittedto the destination.

[0044] In another embodiment, multimedia content to be encoded isidentified. The identified multimedia content is encoded into multipleframes of data. Full timestamps are generated and associated with aportion of the frames of data. Each full timestamp contains completetime information. Compressed timestamps are generated and associatedwith frames of data that are not associated with a full timestamp. Eachcompressed timestamp contains a portion of the complete timeinformation.

[0045] In a described embodiment, the full timestamps include hourinformation, minute information, second information, and a frame number.

[0046] In a particular implementation, the compressed timestamps includea frame number.

BRIEF DESCRIPTION OF THE DRAWINGS

[0047]FIG. 1 illustrates a conventional system for processing anddistributing video content.

[0048]FIG. 2 illustrates an example multimedia encoding system and anexample multimedia decoding system.

[0049]FIG. 3 is a flow diagram illustrating a procedure for encodingmultimedia content and transmitting timestamps and associated multimediacontent frames.

[0050]FIG. 4 is a flow diagram illustrating a procedure for decodingmultimedia content that includes multiple time stamps and associatedcontent frames.

[0051]FIG. 5 illustrates an example of a suitable operating environmentin which the systems and methods described herein may be implemented.

DETAILED DESCRIPTION

[0052] The systems and methods described herein utilize different typesof timing indicators (referred to as timestamps) to communicate timinginformation along with frames of data. The use of both full timestampsand compressed timestamps reduces the amount of timing information thatmust be communicated with the frames of data. A full timestamp includesall timing information and is sent occasionally (e.g., a few times eachsecond or once every X frames of data). Between full timestamps, aseries of compressed timestamps are communicated with the frames ofdata. The compressed timestamps contain a subset of the complete timinginformation contained in the full timestamps. The compressed timestampcontains the timing information that has changed since the last fulltimestamp was sent.

[0053]FIG. 2 illustrates an example multimedia encoding system and anexample multimedia decoding system. A multimedia content source 202provides multimedia content (e.g., audio content, video content, orcombined audio and video content) to an encoder 204. Multimedia contentsource may be, for example, a video camera, microphone or other capturedevice, or a storage device that stores previously captured multimediacontent. Encoder 204 includes a clock 206 and a frame counter 208. Clock206 is used to determine timestamp information and synchronize operationof encoder 204. Frame counter 208 keeps track of consecutive framenumbers associated with frames of data. Encoder 204 also includes anencoding engine 210, which encodes multimedia content and other data(such as timestamp information) into multiple frames. The output ofencoder 204 is communicated to a transmitter 212, which transmits theencoded content to one or more receivers. Alternatively, transmitter 212may be a storage device that stores the encoded content (e.g., on a DVD,magnetic tape, or other storage device).

[0054] Receiver 220 receives an encoded signal including one or moreframes and communicates the received signal to a decoder 222.Alternatively, receiver 220 may be a device (such as a audio playerand/or a video player) capable of reading stored encoded content (e.g.,stored on a DVD or other storage device). Decoder 222 includes a clock224 and a counter 226. Clock 224 aids in synchronizing decoder 222.Counter 226 is used to assign frame identifiers to received frames ofdata. Decoder 222 also includes a decoding engine 228 which decodes thereceived signal. After decoding the received signal, decoder 222communicates the decoded content to a multimedia player 230 whichrenders the multimedia content defined by the decoded signal. Multimediaplayer may be an audio player (e.g., a CD player), a video player (e.g.,a DVD player), or a combination audio player and video player. Decoder222 may be a separate device or may be incorporated into another device,such as a television or a DVD player.

[0055]FIG. 3 is a flow diagram illustrating a procedure 300 for encodingmultimedia content and transmitting timestamps and associated multimediacontent frames. Initially, procedure 300 identifies a number of basicunits per second in a reference clock (block 302), which is representedby a parameter labeled “base_ups”. In a particular example, thereference clock has 30,000 basic units per second (also referred to as30,000 hertz). The procedure then identifies a number of basic units ofthe reference clock per media sample period (block 304), which isrepresented by a parameter labeled “base_upp”. In a particular example,each increment of a counter (such as a frame counter) occurs after 1001increments of the reference clock. In this example, if the counteradvances by five, the reference clock advances by 5005. This examplereduces the amount of data that needs to be communicated regarding theclock (i.e., sending “5” instead of “5005”).

[0056] The procedure 300 then identifies a counting type that definesthe manner in which samples (or frames) are counted (block 306).Additional details regarding the various counting types are providedbelow. At block 308, the base_ups, base_upp, and counting type data istransmitted to one or more receivers. These data values allow eachreceiver to understand and properly decode subsequent frames of data.

[0057] Next, the procedure receives multimedia content to be encoded andcreates a first content frame (block 310). The first content frame iscreated by encoding a portion of the received multimedia content. Theprocedure then transmits a full timestamp along with the first contentframe (block 312). The full timestamp may be embedded within the firstcontent frame or transmitted separately, but along with the fulltimestamp. The full timestamp includes the hour, minutes, seconds, andframe number associated with the first content frame.

[0058] The procedure then creates the next content frame by encoding thenext portion of the received multimedia content (block 314). At block316, the procedure determines whether to transmit a full timestamp or acompressed timestamp. As mentioned above, a full timestamp includes alltime-related information (i.e., the hour, minute, second, and framenumber associated with the first content frame). The compressedtimestamp includes a subset of the information required for a fulltimestamp. In a particular implementation, the compressed timestampcontains the information that has changed since the last timestamp(either full or compressed) was transmitted to the receivers. Typically,the compressed timestamp includes the frame number associated with thecurrent content frame being transmitted. The compressed timestampreduces the amount of data that must be transmitted when compared withthe full timestamp. In a particular implementation, the full timestampis sent several times each second. In an alternate implementation, thefull timestamp is sent every X frames, where X is approximately 15.

[0059] In another implementation, the decision of whether to send a fulltimestamp or compressed timestamp is adjusted dynamically based on anestimate of the reliability of the communication link betweentransmitter and receiver. If the estimated reliability of thecommunication link is high, then full timestamps may be sent lessfrequently. However, if the communication link is not expected to bereliable, the full timestamps are sent more frequently.

[0060] If the procedure determines that a full timestamp should betransmitted, a full timestamp is transmitted along with the next contentframe (block 318). Otherwise, a compressed timestamp is transmittedalong with the next content frame (block 320). The procedure continuesby returning to block 314 to create the next content frame and determinewhether a full timestamp or a compressed timestamp is to be transmittedalong with the next content frame.

[0061] In a particular embodiment, the data that specifies the timebaseand the starting timestamp of a sequence of data samples (or frames) issent using the following pseudo-code: send (base_ups) // unsignedinteger send (base_upp) // unsigned integer send (counting_type) //defined in Table 1 send (full_timestamp_sequence_flag) // boolean send(discontinuity_flag) // boolean send (count_dropped) // boolean send(frames_value) // integer if (counting_type != ‘000’) // integer send(offset_value) send (seconds_value) // integer send (minutes_value) //integer send (hours_value) // integer

[0062] These data specify the time of the first sample of a sequence offrames and specify the timebase necessary for interpretation of theparameters of each individual timestamp. Since these data specify boththe timebase and the initial timestamp for an entire sequence of frames,they are referred to herein as the sequence header information for thisparticular embodiment. In one embodiment, a full timestamp is includedin each sequence header. Alternatively, the sequence headers may notcontain a full timestamp. Instead, the data contained in a fulltimestamp is retrieved from the full timestamp associated with the firstframe of data following the sequence header.

[0063] The base_ups, base_upp, and counting type parameters arediscussed above. Table 1 below defines the various counting_type values.TABLE 1 Value Meaning 000 No dropping of frames_value count values andno use of offset_value 001 No dropping of frames_value count values 010Dropping of individual zero values of frames_value count 011 Dropping ofindividual max_pps values of frames_value count 100 Dropping of the twolowest (values 0 and 1) frames_value counts when seconds_value is zeroand minutes_value is not an integer multiple of ten 101 Dropping ofunspecified individual frames_value count values 110 Dropping ofunspecified numbers of unspecified frames_value count values 111Reserved

[0064] Particular parameters are defined as follows:

[0065] full_timestamp_sequence_flag: Indicates whether every timestampin the following sequence of timestamps shall be fully specified orwhether some timestamps (referred to as compressed timestamps) may onlycontain partial information (depending on memory of values sentpreviously in the sequence header or in a frame timestamp). Iffull_timestamp_sequence flag is “1”, then full_timestamp_flag must be“1” in the timestamp information for every frame in the followingsequence.

[0066] discontinuity_flag: Indicates whether the time difference thatcan be calculated between the starting time of the sequence and the timeindicated for the last previous transmitted frame can be interpreted asa true time difference. Shall be “1” if no previous frame has beentransmitted.

[0067] count_dropped: Indicates, if discontinuity_flag is “0”, whethersome value of frames_value was skipped after the last previoustransmitted frame to reduce drift between the time passage indicated inthe seconds_value, minutes_value, and hours_value parameters and thoseof a true clock.

[0068] frames_value, offset_value, seconds_value, minutes_value, andhours_value: Indicate the parameters to be used in calculating anequivalent timestamp for the first frame in the sequence. Shall be equalto the corresponding values of these parameters in the header of thefirst frame after the sequence header, if present in the sequenceheader.

[0069] In this embodiment, an extra signed-integer parameter calledoffset_value is used in addition to the unsigned integer frames_value,seconds_value, minutes_value, and hours_value parameters that are usedby the SMPTE timecode's timestamp, in order to relate the time of asample precisely relative to true time, as shown in a formula below.

[0070] In a particular embodiment, the timestamp structure sendingprocess for the timestamps on individual media samples (or frames) isimplemented using the following pseudo-code: send (full_timestamp_flag)// boolean send (frames_value) // unsigned integer if(counting_type!=‘000’) { if (full_timestamp_flag) send (offset_value) //signed integer else { send (offset_value_flag) // boolean if(offset_value_flag) send (offset_value) // signed integer } if(counting_type!=‘001’) send (count_dropped_flag) // boolean } if(full_timestamp_flag) { send (seconds_value) // unsigned integer 0..59send (minutes_value) // unsigned integer 0..59 send (hours_value) //unsigned integer } else { send (seconds_flag) // boolean if(seconds_flag) { send (seconds_value) // unsigned integer 0..59 send(minutes_flag) // boolean if (minutes_flag) { send (minutes_value) //unsigned integer 0..59 send (hours_flag) // boolean if (hours_flag) send(hours_value) // unsigned integer } } }

[0071] If any timestamp is incomplete (i.e., full_timestamp_flag is zeroand at least one of seconds_flag, minutes_flag, hours_flag, andoffset_value_flag is present and zero) the last prior sent value foreach missing parameter is used. An equivalent time specifying the timeof a media sample (in units of seconds) may be computed as follows:

[0072]equivalent_time=60×(60×hours_value+minutes_value)+seconds_value+(base_upp×frames_value+offset_value)/base_ups

[0073] Using the timebase parameters, a derived parameter is defined as:

[0074] max_pps=ceil(base_ups/base_upp)

[0075] where cell (x) is defined as the function of an argument x,which, for non-negative values of x, is equal to x if x is an integerand is otherwise equal to the smallest integer greater than x. The valueof frames_value should not exceed max_pps.

[0076] If count dropped flag is ‘1’, then:

[0077] if counting_type is ‘010’, frames_value shall be ‘1’ and thevalue of frames_value for the last previous transmitted frame shall notbe equal to ‘0’ unless a sequence header is present between the twoframes with discontinuity_flag equal to ‘1’.

[0078] if counting_type is ‘011’, frames_value shall be ‘0’ and thevalue of frames_value for the last previous transmitted frame shall notbe equal to max_pps unless a sequence header is present

[0079] between the two frames with discontinuity_flag equal to ‘1’.

[0080] if counting_type is ‘100’, frames_value shall be ‘2’ and theseconds_value shall be zero and minutes_value shall not be an integermultiple of ten and frames_value for the last previous transmitted frameshall not be equal to ‘0’ or ‘1’ unless a sequence header is presentbetween the two frames with discontinuity_flag equal to ‘1’.

[0081] if counting_type is ‘101’ or ‘110’, frames_value shall not beequal to one plus the value of frames_value for the last previoustransmitted frame modulo max_pps unless a sequence header is presentbetween the two frame with discontinuity_flag equal to ‘1’.

[0082] As the degree of precision for the various parameters of eachmedia sample timestamp becomes coarser, the inclusion of the furtherinformation needed to place the timestamp within the more global scaleis optional. Any coarse-level context information that is not sent isimplied to have the same value as the last transmitted parameter of thesame type. The finely-detailed information necessary to locate theprecise sample time relative to that of neighboring samples is includedwith every timestamp, but as the degree of coarseness of the timespecification becomes higher, the inclusion of further more coarsecontext information is optional in order to reduce the average amount ofinformation that is required to be communicated.

[0083]FIG. 4 is a flow diagram illustrating a procedure 400 for decodingmultimedia content that includes multiple time stamps and associatedcontent frames. At block 402, the procedure receives base_ups, base_upp,and counting type data associated with a multimedia stream from atransmitting device. This information allows the receiving system toproperly interpret and decode the subsequently received content. Next, afull timestamp and an associated first multimedia content frame arereceived (block 404). The full timestamp provides the hours, minutes,seconds, and frame number associated with the first received frame, and,in a particular embodiment, a time offset number allowing a drift-freeprecise relation to be determined between the time computed from theother parameters and the true time of the sample.

[0084] The procedure 400 then receives a next multimedia content frameand an associated timestamp. An associated flag (full_timestamp_flag)will indicate whether the timestamp is a full timestamp or a compressedtimestamp. The procedure decodes the multimedia content frame (block408) and determines (based on the full timestamp flag) whether thetimestamp is a full timestamp or a compressed timestamp (block 410). Ifthe timestamp is a full timestamp, the procedure updates all timingparameters provided by the full timestamp (block 412). If the timestampis a compressed timestamp, the procedure updates the frame parameter(block 414). The system uses the values from the most recent fulltimestamp for all other timing parameter values. Alternatively, if acompressed timestamp is received, the procedure updates all timingparameters contained in the compressed timestamp.

[0085] After updating one or more timing parameters, the procedurereturns to block 406 to receive and process the next multimedia contentframe and associated timestamp.

[0086]FIG. 5 illustrates an example of a suitable computing environment500 within which the video encoding and decoding procedures may beimplemented (either fully or partially). The computing environment 500may be utilized in the computer and network architectures describedherein.

[0087] The exemplary computing environment 500 is only one example of acomputing environment and is not intended to suggest any limitation asto the scope of use or functionality of the computer and networkarchitectures. Neither should the computing environment 500 beinterpreted as having any dependency or requirement relating to any oneor combination of components illustrated in the exemplary computingenvironment 500.

[0088] The video encoding and decoding systems and methods describedherein may be implemented with numerous other general purpose or specialpurpose computing system environments or configurations. Examples ofwell known computing systems, environments, and/or configurations thatmay be suitable for use include, but are not limited to, personalcomputers, server computers, multiprocessor systems,microprocessor-based systems, network PCs, minicomputers, mainframecomputers, distributed computing environments that include any of theabove systems or devices, and so on. Compact or subset versions may alsobe implemented in clients of limited resources.

[0089] The computing environment 500 includes a general-purposecomputing device in the form of a computer 502. The components ofcomputer 502 can include, by are not limited to, one or more processorsor processing units 504, a system memory 506, and a system bus 508 thatcouples various system components including the processor 504 to thesystem memory 506.

[0090] The system bus 508 represents one or more of several possibletypes of bus structures, including a memory bus or memory controller, aperipheral bus, an accelerated graphics port, and a processor or localbus using any of a variety of bus architectures. By way of example, sucharchitectures can include an Industry Standard Architecture (ISA) bus, aMicro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, aVideo Electronics Standards Association (VESA) local bus, and aPeripheral Component Interconnects (PCI) bus also known as a Mezzaninebus.

[0091] Computer 502 typically includes a variety of computer readablemedia. Such media can be any available media that is accessible bycomputer 502 and includes both volatile and non-volatile media,removable and non-removable media.

[0092] The system memory 506 includes computer readable media in theform of volatile memory, such as random access memory (RAM) 510, and/ornon-volatile memory, such as read only memory (ROM) 512. A basicinput/output system (BIOS) 514, containing the basic routines that helpto transfer information between elements within computer 502, such asduring start-up, is stored in ROM 512. RAM 510 typically contains dataand/or program modules that are immediately accessible to and/orpresently operated on by the processing unit 504.

[0093] Computer 502 may also include other removable/non-removable,volatile/non-volatile computer storage media. By way of example, FIG. 5illustrates a hard disk drive 516 for reading from and writing to anon-removable, non-volatile magnetic media (not shown), a magnetic diskdrive 518 for reading from and writing to a removable, non-volatilemagnetic disk 520 (e.g., a “floppy disk”), and an optical disk drive 522for reading from and/or writing to a removable, non-volatile opticaldisk 524 such as a CD-ROM, DVD-ROM, or other optical media. The harddisk drive 516, magnetic disk drive 518, and optical disk drive 522 areeach connected to the system bus 508 by one or more data mediainterfaces 526. Alternatively, the hard disk drive 516, magnetic diskdrive 518, and optical disk drive 522 can be connected to the system bus508 by one or more interfaces (not shown).

[0094] The disk drives and their associated computer-readable mediaprovide non-volatile storage of computer readable instructions, datastructures, program modules, and other data for computer 502. Althoughthe example illustrates a hard disk 516, a removable magnetic disk 520,and a removable optical disk 524, it is to be appreciated that othertypes of computer readable media which can store data that is accessibleby a computer, such as magnetic cassettes or other magnetic storagedevices, flash memory cards, CD-ROM, digital versatile disks (DVD) orother optical storage, random access memories (RAM), read only memories(ROM), electrically erasable programmable read-only memory (EEPROM), andthe like, can also be utilized to implement the exemplary computingsystem and environment.

[0095] Any number of program modules can be stored on the hard disk 516,magnetic disk 520, optical disk 524, ROM 512, and/or RAM 510, includingby way of example, an operating system 526, one or more applicationprograms 528, other program modules 530, and program data 532. Each ofthe operating system 526, one or more application programs 528, otherprogram modules 530, and program data 932 (or some combination thereof)may include elements of the video encoding and/or decoding algorithmsand systems.

[0096] A user can enter commands and information into computer 502 viainput devices such as a keyboard 534 and a pointing device 536 (e.g., a“mouse”). Other input devices 538 (not shown specifically) may include amicrophone, joystick, game pad, satellite dish, serial port, scanner,and/or the like. These and other input devices are connected to theprocessing unit 504 via input/output interfaces 540 that are coupled tothe system bus 508, but may be connected by other interface and busstructures, such as a parallel port, game port, or a universal serialbus (USB).

[0097] A monitor 542 or other type of display device can also beconnected to the system bus 508 via an interface, such as a videoadapter 544. In addition to the monitor 542, other output peripheraldevices can include components such as speakers (not shown) and aprinter 546 which can be connected to computer 502 via the input/outputinterfaces 540.

[0098] Computer 502 can operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computingdevice 548. By way of example, the remote computing device 548 can be apersonal computer, portable computer, a server, a router, a networkcomputer, a peer device or other common network node, and so on. Theremote computing device 548 is illustrated as a portable computer thatcan include many or all of the elements and features described hereinrelative to computer 502.

[0099] Logical connections between computer 502 and the remote computer548 are depicted as a local area network (LAN) 550 and a general widearea network (WAN) 552. Such networking environments are commonplace inoffices, enterprise-wide computer networks, intranets, and the Internet.

[0100] When implemented in a LAN networking environment, the computer502 is connected to a local network 550 via a network interface oradapter 554. When implemented in a WAN networking environment, thecomputer 502 typically includes a modem 556 or other means forestablishing communications over the wide network 552. The modem 556,which can be internal or external to computer 502, can be connected tothe system bus 508 via the input/output interfaces 540 or otherappropriate mechanisms. It is to be appreciated that the illustratednetwork connections are exemplary and that other means of establishingcommunication link(s) between the computers 502 and 548 can be employed.

[0101] In a networked environment, such as that illustrated withcomputing environment 500, program modules depicted relative to thecomputer 502, or portions thereof, may be stored in a remote memorystorage device. By way of example, remote application programs 558reside on a memory device of remote computer 548. For purposes ofillustration, application programs and other executable programcomponents such as the operating system are illustrated herein asdiscrete blocks, although it is recognized that such programs andcomponents reside at various times in different storage components ofthe computing device 502, and are executed by the data processor(s) ofthe computer.

[0102] An implementation of the system and methods described herein mayresult in the storage or transmission of data, instructions, or otherinformation across some form of computer readable media. Computerreadable media can be any available media that can be accessed by acomputer. By way of example, and not limitation, computer readable mediamay comprise “computer storage media” and “communications media.”“Computer storage media” include volatile and non-volatile, removableand non-removable media implemented in any method or technology forstorage of information such as computer readable instructions, datastructures, program modules, or other data. Computer storage mediaincludes, but is not limited to, RAM, ROM, EEPROM, flash memory or othermemory technology, CD-ROM, digital versatile disks (DVD) or otheroptical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can be accessed by acomputer.

[0103] “Communication media” typically embodies computer readableinstructions, data structures, program modules, or other data in amodulated data s signal, such as carrier wave or other transportmechanism. Communication media also includes any information deliverymedia. The term “modulated data signal” means a signal that has one ormore of its characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared, and other wireless media. Combinations of any of the above arealso included within the scope of computer readable media.

[0104] Alternatively, portions of the systems and methods describedherein may be implemented in hardware or a combination of hardware,software, and/or firmware. For example, one or more application specificintegrated circuits (ASICs) or programmable logic devices (PLDs) couldbe designed or programmed to implement one or more portions of the videoencoding or video decoding systems and procedures.

[0105] Although the description above uses language that is specific tostructural features and/or methodological acts, it is to be understoodthat the invention defined in the appended claims is not limited to thespecific features or acts described. Rather, the specific features andacts are disclosed as exemplary forms of implementing the invention.

1. A method comprising: encoding a first frame of data; generating afirst timestamp associated with the first frame of data, wherein thefirst timestamp includes complete timing information; transmitting thefirst frame of data and the associated first timestamp to a destination;encoding a second frame of data; generating a second timestampassociated with the second frame of data, wherein the second timestampincludes a portion of the complete timing information; and transmittingthe second frame of data and the associated second timestamp to thedestination.
 2. A method as recited in claim 1 further comprising:encoding a third frame of data; generating a third timestamp associatedwith the third frame of data, wherein the third timestamp includes aportion of the complete timing information; and transmitting the thirdframe of data and the associated third timestamp to the destination. 3.A method as recited in claim 1 further comprising: identifying timinginformation related to transmitting the first and second frames of data;and transmitting the timing information to the destination.
 4. A methodas recited in claim 1 wherein the first timestamp includes hourinformation, minute information, second information, and a frame number.5. A method as recited in claim 1 wherein the first timestamp includesan offset value that is used to relate the time associated with a frameof data to true time.
 6. A method as recited in claim 1 wherein thesecond timestamp includes a frame number.
 7. A method as recited inclaim 1 further comprising: encoding a plurality of frames of data; andgenerating additional timestamps associated with each of the pluralityof frames of data, wherein the majority of the additional timestampsinclude a portion of the complete timing information.
 8. A method asrecited in claim 1 further comprising: encoding a plurality of frames ofdata; generating a full timestamp associated with one of the pluralityof frames of data, wherein the full timestamp includes the completetiming information; and generating a plurality of compressed timestampsassociated with the frames of data that are not associated with the fulltimestamp, wherein the compressed timestamps include a portion of thecomplete timing information.
 9. One or more computer-readable memoriescontaining a computer program that is executable by a processor toperform the method recited in claim
 1. 10. A method comprising:identifying multimedia content to be encoded; encoding the identifiedmultimedia content into a plurality of frames of data; generating aplurality of full timestamps associated with a portion of the frames ofdata, wherein each full timestamp contains complete time information;and generating a plurality of compressed timestamps associated withframes of data that are not associated with a full timestamp, whereineach compressed timestamp contains a portion of the complete timeinformation.
 11. A method as recited in claim 10 wherein the fulltimestamps are associated with every Xth frame of data.
 12. A method asrecited in claim 10 wherein the full timestamps are associated withframes of data spaced apart by a predetermined time period.
 13. A methodas recited in claim 10 wherein the full timestamps include hourinformation, minute information, second information, and a frame number.14. A method as recited in claim 10 wherein the full timestamps includean offset value that is used to relate the time associated with a frameof data to true time.
 15. A method as recited in claim 10 wherein thecompressed timestamps include a frame number.
 16. A method as recited inclaim 10 further comprising storing the frames of data and theassociated timestamps.
 17. A method as recited in claim 10 furthercomprising transmitting the frames of data and the associated timestampsto a plurality of destinations.
 18. One or more computer-readablememories containing a computer program that is executable by a processorto perform the method recited in claim
 10. 19. A method comprising:receiving a first frame of data; receiving a first timestamp associatedwith the first frame of data, wherein the first timestamp includescomplete timing information for the first frame of data; receiving asecond frame of data; and receiving a second timestamp associated withthe second frame of data, wherein the second timestamp includes aportion of the timing information.
 20. A method as recited in claim 19further comprising decoding the first frame of data and the second frameof data.
 21. A method as recited in claim 19 further comprising:receiving a third frame of data; receiving a third timestamp associatedwith the third frame of data, wherein the third timestamp includes aportion of the timing information; and decoding the third frame of data.22. A method as recited in claim 19 further comprising receiving timinginformation related to the manner in which frames of data aretransmitted from a data source.
 23. A method as recited in claim 19wherein the first timestamp is a full timestamp and the second timestampis a compressed timestamp.
 24. A method as recited in claim 19 whereinreceiving the first timestamp includes updating all timing parameterswith the information contained in the first timestamp.
 25. A method asrecited in claim 19 wherein receiving the second timestamp includesupdating timing parameters with the information contained in the secondtimestamp.
 26. One or more computer-readable memories containing acomputer program that is executable by a processor to perform the methodrecited in claim
 19. 27. One or more computer-readable media havingstored thereon a computer program that, when executed by one or moreprocessors, causes the one or more processors to: encode a first frameof data; generate a first timestamp associated with the first frame ofdata, wherein the first timestamp includes complete time information;encode a plurality of subsequent frames of data; and generate aplurality of subsequent timestamps, wherein each of the subsequenttimestamps includes a portion of the time information.
 28. One or morecomputer-readable media as recited in claim 27 wherein the complete timeinformation includes hour information, minute information, secondinformation, and a frame number.
 29. One or more computer-readable mediaas recited in claim 27 wherein each of the subsequent timestampsincludes a frame number.
 30. An apparatus comprising: an encodedmultimedia content source; and a decoder coupled to receive encodedmultimedia content from the encoded multimedia content source, whereinthe video content includes a first frame of data having an associatedfirst timestamp, such that the first timestamp includes complete timinginformation for the first frame of data, and wherein the encodedmultimedia content includes a second frame of data having an associatedsecond timestamp, such that the second timestamp includes a subset ofthe timing information included in the first timestamp.
 31. An apparatusas recited in claim 30 wherein the decoder is configured to decode thefirst frame of data and the second frame of data.